You
can think of SSIS as a data import/export/transformation layer in the
overall system architecture that you are deploying for at least most of
your Microsoft-based applications and a few non-Microsoft applications
(see Figure 1).
SSIS allows you to “data enable” almost all the individual applications
or systems that are part of your overall implementation, such as OLTP
databases, multidimensional cubes, OLAP data warehouses, Excel files,
Access databases, flat files, other heterogeneous database sources, and
even web services. The Integration Services object model includes both
native and managed APIs for doing most SSIS work. This includes APIs
for any of the SSIS tools, the command-line utilities, and even custom
applications. SSIS Designer and the Integration Services Wizard both
use the Integration Services object model. SSIS includes the
integration service itself (that is, the service that manages all SSIS
packages), the Integration Services object model, the SSIS runtime and
runtime executables, and the data flow task (which has a data flow
engine, source, transformation, and destination components).
Microsoft
uses SSIS packages to implement any data movement/transformation.
Basically, Microsoft treats SSIS packages as if they are managed code
and requires that you create Integration Services projects and
deployment utilities as part of managing these SSIS packages. In
addition, separate Integration Services Connection projects can now be
created to aid in connection and provider services. All in all, this is
a very good approach that significantly reduces errors and allows you
to go through a reasonably formal release to production (that is,
development and deployment) cycle.
SSIS packages contain a
collection of connections, control flow elements, data flow elements,
event handlers, variables, and configurations. They take the form of
tasks, containers, transformations, and workflows. SSIS packages go
through one or more steps that are either executed sequentially or in
parallel at package execution time. In a nutshell, when an SSIS package
is executed, it does the following:
1. | Connects to any identified data source
| 2. | Copies data (and database objects, if needed)
| 3. | Transforms data
| 4. | Disconnects from the data sources
| 5. | Notifies users, processes, and even other packages of events (such as sending an email when something is done or has errors)
|
The basic SSIS package consists of the following:
SSIS packages—
A package is a discrete, named collection of connections, control flow,
and data flows that implement data movement/data transformation. SSIS control flow and tasks—
One or more tasks and containers drive what the package does. You
organize control flow based on what you want the package to do. Tasks
are the actions taken to accomplish the desired data transformation and
movement. A task can execute any SQL statement, send mail, bulk insert
data, execute an ActiveX script, run a Visual Studio Tool for
Application script (VSTA), or launch another package or an external
program. SSIS containers— A container groups one or more related tasks that you want to manage together (and reuse together). Workflows—
Workflows are definable precedence constraints that allow you to link
two tasks, based on whether the first task executes, executes
successfully, or executes unsuccessfully. Workflow containers are the
wrappers for the tasks and are the means for the flow of control. A
task can run alone, parallel to another task, or sequentially,
according to precedence constraints. Precedence constraints are of
three types: Unconditional—It does not matter whether the preceding step failed or succeeded. On success—The preceding step must have been successful for the execution of the next step. On failure—This constraint returns the appropriate error.
SSIS data flow— The
data flow identifies the sources and destinations that extract and load
data; identifies the transformations that manipulate or enhance the
data; and provides the paths that link sources, transformations, and
destinations. SSIS data flow task— A data flow task creates, orders, and runs the data flows themselves, using a data flow engine. SSIS transformations—
Transformations are one or more functions or operations applied against
a piece of data before the data arrives at the destination.
In SSIS, everything is pretty
much a task or a collection of tasks (one or more containers, tasks in
containers), as you can see in Figure 2.
Control flow determines the overall execution of the package and data
flows that access the data, transform it, and write it. Precedence
constraints determine the overall control flow—connecting the
executables, containers, and tasks into an ordered control flow.
SSIS also has several objects that extend package functionality:
SSIS event handlers—
These workflow tasks run in response to events raised by a package,
task, or container. This is much the same as most programming
languages, such as Java or C#. If a task (or package or container) has
some issue (that is, raises an event), the event handler can be used to
handle the issue appropriately. Typical events in data transformation
processing that need to be handled with event handlers might include
connections not being established, disk space issues, and so on. You can even have the event handlers write out emails or initiate other workflows. SSIS configurations—
These objects are used to help parameterize many of the previously
hard-bound characteristics of packages at runtime. When a package is
run, the configuration information is loaded (updating the values of
the package’s properties), and then the package is run using the new
configuration values (all without having to modify the package). SSIS
configurations use the classic property/value pair paradigm to
represent the properties that are to be configurable. Following are the
varied methods of representing configuration files: XML configuration file—This
file identifies the configuration property/value pairs for any number
of configuration values. The following sample XML configuration file is
for a package named UnleashedPackage with a property of PKGVar: <?xml version="1.0"?> <DTSConfiguration> <DTSConfigurationHeading> <DTSConfigurationFileInfo GeneratedBy="DatabaseArchitechs\PBertucci" GeneratedFromPackageName="UnleashedPackage" GeneratedFromPackageID="{3GV09721-816B-4E28-9878-0DE37A150234}" GeneratedDate="7/09/2009 7:12:22 AM"/> </DTSConfigurationHeading> <Configuration ConfiguredType="Property" Path="\Package.Variables[User::PKGVar].Value" ValueType="Int32"> <ConfiguredValue>0</ConfiguredValue> </Configuration> </DTSConfiguration>
A
configuration header contains information about the configuration file.
This element includes attributes such as when the file was created and
the name of the person who generated the file. In addition, a
configuration element contains information about each configuration.
This element includes attributes such as the property path and
configured value of a property. Configuration table in SQL Server—This table stores configuration entries for use by the packages. Environment variables (VARs)—These can be referenced by the package. Parent package VARs—These can be used by child packages. Entry in Registry—The Registry can also contain the configuration values.
SSIS Logging— Logging
can be done from any task or package to write out any type of logging
information desired. When a supplied logging provider is used, a
package can provide a rich runtime history. Logs are associated with
packages (that is, the reference point), but any task (or container)
can write to any package’s log. In this way, it is possible to have
consolidated logs of a driver package with the full execution history
of all child packages. The log providers (out of the box) write to a
flat file (text file) or to SQL Server tables. Other custom logging
providers can be used, though. You can log what you need to log—start
date/time, end date/time, records transformed, errors, and so on. SSIS variables—
SSIS has both system variables and user-defined variables. System
variables provide runtime package object information to tasks or other
packages. This information is helpful when you want to reference these
system variables to help decide what to do next. (They can be used in
expressions, scripts, and configurations.) User-defined variables are
really for specialized variables that are not found as system variables
and only have to be used within a package’s scope. Again, these
variables can be used in expressions, scripts, and configurations
within a package.
SSIS packages can run other
packages. This capability is very helpful when you want to granularly
break out common data transformations for reuse by many different
higher-level solutions (that is, higher-level packages that execute
common-detail-level transformation packages).
Note
When an SSIS package is first created, it is given a globally unique identifier (GUID) that is added to the package’s ID property and a name that is added to its NAME
property. After these are created, they become part of the reference
mechanism for the package itself. If you ever copy a package as the
basis of a new package, you have to rename these two properties so they
are unique (that is, new GUID and new NAME property). If you simply want to give an existing package a new NAME or ID value, you can do so directly or with the dtutil command-line utility.
You can also create
packages that can be restarted at a point of failure, including
restarting specific tasks within a package (and not all the tasks in a
package). This is a super addition to SQL Server 2008. If a package had
more than one data flow task and one completed but the others didn’t,
you could restart just the data flow tasks that had not completed
without rerunning the ones that had worked fine. Long-running packages
can also create checkpoints to provide milestones from which to
restart. This capability will save many sleepless nights for the folks
doing production support for data transformation processing.
|